Search CORE

Bootstrap-after-Bootstrap Model Averaging for Reducing Model Uncertainty in Model Selection for Air Pollution Mortality Studies

Author: Akaike H
Burnham KP
Davison AC
Draper D
Hoeting JA
Michael A. Martin
National Research Council
R Development Core Team
Steven Roberts
Zeger SL
Publication venue: National Institute of Environmental Health Sciences
Publication date: 01/01/2009
Field of study

Ba c k g r o u n d: Concerns have been raised about findings of associations between particulate matter (PM) air pollution and mortality that have been based on a single “best ” model arising from a model selection procedure, because such a strategy may ignore model uncertainty inherently involved in searching through a set of candidate models to find the best model. Model averaging has been proposed as a method of allowing for model uncertainty in this context. Objectives: To propose an extension (double BOOT) to a previously described bootstrap modelaveraging procedure (BOOT) for use in time series studies of the association between PM and mortality. We compared double BOOT and BOOT with Bayesian model averaging (BMA) and a standard method of model selection [standard Akaike’s information criterion (AIC)]. Me t h o d: Actual time series data from the United States are used to conduct a simulation study to compare and contrast the performance of double BOOT, BOOT, BMA, and standard AIC. Re s u l t s: Double BOOT produced estimates of the effect of PM on mortality that have had smaller root mean squared error than did those produced by BOOT, BMA, and standard AIC. This performance boost resulted from estimates produced by double BOOT having smaller variance than those produced by BOOTand BMA. Co n c l u s i o n s: Double BOOT is a viable alternative to BOOT and BMA for producing estimates of the mortality effect of PM. Key w o r d s: air pollution, Bayesian, bootstrap, model averaging, mortality, particulate matter. Environ Health Perspect 118:131–136 (2010). doi:10.1289/ehp.0901007 available vi

CiteSeerX

The Australian National University

Fuzzy Fibers: Uncertainty in dMRI Tractography

Author: A Pfefferbaum
AN Voineskos
B Jeurissen
B Whitcher
C Vollmar
DC Alexander
DK Jones
DK Jones
DK Jones
DK Jones
E Heiervang
F Calamante
F Tensaouti
H Greenspan
JA Hoeting
K Scheffler
M Catani
M Kinoshita
M Lazar
MA Koch
O Ciccarelli
P Hagmann
RE Kass
S Chung
S Pajevic
S Peled
S Peled
S Wakana
T Schultz
TEJ Behrens
TEJ Behrens
U Bürgel
Z Ding
Publication venue
Publication date: 11/07/2013
Field of study

Fiber tracking based on diffusion weighted Magnetic Resonance Imaging (dMRI) allows for noninvasive reconstruction of fiber bundles in the human brain. In this chapter, we discuss sources of error and uncertainty in this technique, and review strategies that afford a more reliable interpretation of the results. This includes methods for computing and rendering probabilistic tractograms, which estimate precision in the face of measurement noise and artifacts. However, we also address aspects that have received less attention so far, such as model selection, partial voluming, and the impact of parameters, both in preprocessing and in fiber tracking itself. We conclude by giving impulses for future research

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines

Author: A Burton
A Rouxel
AC Mertens
Andrea Marshall
BL Thomsen
C Serrat
D Collett
DB Rubin
DB Rubin
DB Rubin
DG Altman
Douglas G Altman
DW Hosmer
FE Harrell
FE Harrell
FR Hampel
G Ambler
G Vaughn
HC van Houwelingen
J O'Quigley
JA Hoeting
JC Wyatt
JL Schafer
JW Graham
KH Li
M Schemper
M Schemper
MG Kenward
MW Heymans
N Orsini
O Harel
P Peduzzi
P Royston
Patrick Royston
RA Fisher
Roger L Holder
S Gill
S Sinharay
S van Buuren
T Bärnighausen
TG Clark
TG Clark
WM Stadler
XL Meng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: Multiple imputation (MI) provides an effective approach to handle missing covariate data within prognostic modelling studies, as it can properly account for the missing data uncertainty. The multiply imputed datasets are each analysed using standard prognostic modelling techniques to obtain the estimates of interest. The estimates from each imputed dataset are then combined into one overall estimate and variance, incorporating both the within and between imputation variability. Rubin's rules for combining these multiply imputed estimates are based on asymptotic theory. The resulting combined estimates may be more accurate if the posterior distribution of the population parameter of interest is better approximated by the normal distribution. However, the normality assumption may not be appropriate for all the parameters of interest when analysing prognostic modelling studies, such as predicted survival probabilities and model performance measures. Methods: Guidelines for combining the estimates of interest when analysing prognostic modelling studies are provided. A literature review is performed to identify current practice for combining such estimates in prognostic modelling studies. Results: Methods for combining all reported estimates after MI were not well reported in the current literature. Rubin's rules without applying any transformations were the standard approach used, when any method was stated. Conclusion: The proposed simple guidelines for combining estimates after MI may lead to a wider and more appropriate use of MI in future prognostic modelling studies

University of Birmingham Research Portal

Warwick Research Archives Portal Repository

UCL Discovery

Oxford University Research Archive

A heteroskedastic error covariance matrix estimator using a first-order conditional autoregressive Markov simulation for deriving asympotical efficient estimates from ecological sampled Anopheles arabiensis aquatic habitat covariates

Author: A Getis
Benjamin G Jacob
BG Jacob
BG Jacob
BG Jacob
BG Jacob
BG Jacob
C Mutero
CM Mutero
D Draper
D Sun
DA Griffith
DA Griffith
Daniel A Griffith
EJ Muturi
EJ Muturi
Ephantus J Muturi
Erick X Caamano
GM Henebry
I Kleinschmidt
J Mwangangi
JA Hoeting
John I Githure
K Beven
MEJ Woolhouse
P De Jong
P Diggle
PJ Diggle
RG Hills
RG Hills
Robert J Novak
T Shililu
WL Oberkampf
X Gao
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Autoregressive regression coefficients for <it>Anopheles arabiensis </it>aquatic habitat models are usually assessed using global error techniques and are reported as error covariance matrices. A global statistic, however, will summarize error estimates from multiple habitat locations. This makes it difficult to identify where there are clusters of <it>An. arabiensis </it>aquatic habitats of acceptable prediction. It is therefore useful to conduct some form of spatial error analysis to detect clusters of <it>An. arabiensis </it>aquatic habitats based on uncertainty residuals from individual sampled habitats. In this research, a method of error estimation for spatial simulation models was demonstrated using autocorrelation indices and eigenfunction spatial filters to distinguish among the effects of parameter uncertainty on a stochastic simulation of ecological sampled <it>Anopheles </it>aquatic habitat covariates. A test for diagnostic checking error residuals in an <it>An. arabiensis </it>aquatic habitat model may enable intervention efforts targeting productive habitats clusters, based on larval/pupal productivity, by using the asymptotic distribution of parameter estimates from a residual autocovariance matrix. The models considered in this research extends a normal regression analysis previously considered in the literature. Methods Field and remote-sampled data were collected during July 2006 to December 2007 in Karima rice-village complex in Mwea, Kenya. SAS 9.1.4® was used to explore univariate statistics, correlations, distributions, and to generate global autocorrelation statistics from the ecological sampled datasets. A local autocorrelation index was also generated using spatial covariance parameters (i.e., Moran's Indices) in a SAS/GIS® database. The Moran's statistic was decomposed into orthogonal and uncorrelated synthetic map pattern components using a Poisson model with a gamma-distributed mean (i.e. negative binomial regression). The eigenfunction values from the spatial configuration matrices were then used to define expectations for prior distributions using a Markov chain Monte Carlo (MCMC) algorithm. A set of posterior means were defined in WinBUGS 1.4.3®. After the model had converged, samples from the conditional distributions were used to summarize the posterior distribution of the parameters. Thereafter, a spatial residual trend analyses was used to evaluate variance uncertainty propagation in the model using an autocovariance error matrix. Results By specifying coefficient estimates in a Bayesian framework, the covariate number of tillers was found to be a significant predictor, positively associated with <it>An. arabiensis </it>aquatic habitats. The spatial filter models accounted for approximately 19% redundant locational information in the ecological sampled <it>An. arabiensis </it>aquatic habitat data. In the residual error estimation model there was significant positive autocorrelation (i.e., clustering of habitats in geographic space) based on log-transformed larval/pupal data and the sampled covariate depth of habitat. Conclusion An autocorrelation error covariance matrix and a spatial filter analyses can prioritize mosquito control strategies by providing a computationally attractive and feasible description of variance uncertainty estimates for correctly identifying clusters of prolific <it>An. arabiensis </it>aquatic habitats based on larval/pupal productivity.</p

Bayesian probabilistic network modeling from multiple independent replicates

Author: A de la Fuente
A Dobra
A Wille
A Wille
D Heckerman
D Hwang
David J John
DJ John
DJ John
E Segal
H Lee
H Li
J Pearl
J Schäfer
J Schäfer
JA Hoeting
James L Norris
K Burnham
Kristopher L Patton
MH DeGroot
N Friedman
N Krämer
P Hoff
PM Magwene
RA Black
RA Johnson
S Chib
SJ Mason
SL Lauritzen
Y Wang
Publication venue: BioMed Central
Publication date
Field of study

Often protein (or gene) time-course data are collected for multiple replicates. Each replicate generally has sparse data with the number of time points being less than the number of proteins. Usually each replicate is modeled separately. However, here all the information in each of the replicates is used to make a composite inference about signal networks. The composite inference comes from combining well structured Bayesian probabilistic modeling with a multi-faceted Markov Chain Monte Carlo algorithm. Based on simulations which investigate many different types of network interactions and experimental variabilities, the composite examination uncovers many important relationships within the networks. In particular, when the edge's partial correlation between two proteins is at least moderate, then the composite's posterior probability is large

Integrating Factor Analysis and a Transgenic Mouse Model to Reveal a Peripheral Blood Predictor of Breast Tumors

Author: A Leder
A Subramanian
A Subramanian
AE Baird
AH Bild
AK Zaas
C Hans
CM Carvalho
E Amir
E Huang
E Huang
E Warner
ER Andrechek
Erich Huang
F Kamangar
FM Batliwalla
H Weedon-Fekjaer
H Xu
Heather G LaBreche
I Osman
J Aaroe
J Lucas
JA Hoeting
JG Elmore
Joseph R Nevins
JT Chang
K Kerlikowske
L Bennett
L Shi
M Han
MK Showe
NC Twine
P Sharma
PL Porter
RD Rosenberg
T Xu
TM Kolb
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Transgenic mouse tumor models have the advantage of facilitating controlled in vivo oncogenic perturbations in a common genetic background. This provides an idealized context for generating transcriptome-based diagnostic models while minimizing the inherent noisiness of high-throughput technologies. However, the question remains whether models developed in such a setting are suitable prototypes for useful human diagnostics. We show that latent factor modeling of the peripheral blood transcriptome in a mouse model of breast cancer provides the basis for using computational methods to link a mouse model to a prototype human diagnostic based on a common underlying biological response to the presence of a tumor. Methods We used gene expression data from mouse peripheral blood cell (PBC) samples to identify significantly differentially expressed genes using supervised classification and sparse ANOVA. We employed these transcriptome data as the starting point for developing a breast tumor predictor from human peripheral blood mononuclear cells (PBMCs) by using a factor modeling approach. Results The predictor distinguished breast cancer patients from healthy individuals in a cohort of patients independent from that used to build the factors and train the model with 89% sensitivity, 100% specificity and an area under the curve (AUC) of 0.97 using Youden's J-statistic to objectively select the model's classification threshold. Both permutation testing of the model and evaluating the model strategy by swapping the training and validation sets highlight its stability. Conclusions We describe a human breast tumor predictor based on the gene expression of mouse PBCs. This strategy overcomes many of the limitations of earlier studies by using the model system to reduce noise and identify transcripts associated with the presence of a breast tumor over other potentially confounding factors. Our results serve as a proof-of-concept for using an animal model to develop a blood-based diagnostic, and it establishes an experimental framework for identifying predictors of solid tumors, not only in the context of breast cancer, but also in other types of cancer.</p

Dose–responses from multi-model inference for the non-cancer disease mortality of atomic bomb survivors

Author: CE Land
CR Muirhead
D Averbeck
D McGeoghegan
DA Pierce
DA Pierce
DB Richardson
DB Richardson
DJ Brenner
DL Preston
FG Davis
G Claeskens
H Akaike
H Akaike
H Schöllnberger
H Schöllnberger
H Schöllnberger
H. Schöllnberger
J. C. Kaiser
JA Hoeting
JC Kaiser
JP Ashmore
KP Burnham
L Walsh
L Walsh
L Walsh
L. Walsh
M Kreuzer
M Vrijheid
M Yamada
MP Little
MP Little
P. Jacob
REJ Mitchel
SC Darby
TV Azizova
TV Azizova
TV Azizova
TV Azizova
VK Ivanov
VL Roger
Y Shimizu
ZA Carr
Publication venue: Springer-Verlag
Publication date: 01/01/2012
Field of study

The non-cancer mortality data for cerebrovascular disease (CVD) and cardiovascular diseases from Report 13 on the atomic bomb survivors published by the Radiation Effects Research Foundation were analysed to investigate the dose–response for the influence of radiation on these detrimental health effects. Various parametric and categorical models (such as linear-no-threshold (LNT) and a number of threshold and step models) were analysed with a statistical selection protocol that rated the model description of the data. Instead of applying the usual approach of identifying one preferred model for each data set, a set of plausible models was applied, and a sub-set of non-nested models was identified that all fitted the data about equally well. Subsequently, this sub-set of non-nested models was used to perform multi-model inference (MMI), an innovative method of mathematically combining different models to allow risk estimates to be based on several plausible dose–response models rather than just relying on a single model of choice. This procedure thereby produces more reliable risk estimates based on a more comprehensive appraisal of model uncertainties. For CVD, MMI yielded a weak dose–response (with a risk estimate of about one-third of the LNT model) below a step at 0.6 Gy and a stronger dose–response at higher doses. The calculated risk estimates are consistent with zero risk below this threshold-dose. For mortalities related to cardiovascular diseases, an LNT-type dose–response was found with risk estimates consistent with zero risk below 2.2 Gy based on 90% confidence intervals. The MMI approach described here resolves a dilemma in practical radiation protection when one is forced to select between models with profoundly different dose–responses for risk estimates

PuSH

Uncertainty analysis using Bayesian Model Averaging: a case study of input variables to energy models and inference to associated uncertainties of energy scenarios

Author: AE Raftery
AM Isaac
AV Skorokhod
BP
BP
C Dieckhoff
C Hamarat
CL Chua
D Allaire
D Pollard
D Önkal
E Hofer
E Ley
EI George
ENTSO-E
EUROSTAT
F Bolger
G Koop
H Welsch
HJ Bierens
International Energy Agency
J Nowotarski
JA Hoeting
JC Refsgaard
JM Sloughter
L Schrattenholzer
M Baroni
M Culka
M Lindgren
M Munasinghe
MD Mastrandrea
MD Morris
P Kloprogge
P van Notten
PA Pilavachi
PE Dodds
R Bezdek
R Loulou
S Zeugner
W Weimer-Jehle
WC Labys
WE Walker
Publication venue: BioMed Central
Publication date: 21/03/2017
Field of study

Background Energy models are used to illustrate, calculate and evaluate energy futures under given assumptions. The results of energy models are energy scenarios representing uncertain energy futures. Methods The discussed approach for uncertainty quantification and evaluation is based on Bayesian Model Averaging for input variables to quantitative energy models. If the premise is accepted that the energy model results cannot be less uncertain than the input to energy models, the proposed approach provides a lower bound of associated uncertainty. The evaluation of model-based energy scenario uncertainty in terms of input variable uncertainty departing from a probabilistic assessment is discussed. Results The result is an explicit uncertainty quantification for input variables of energy models based on well-established measure and probability theory. The quantification of uncertainty helps assessing the predictive potential of energy scenarios used and allows an evaluation of possible consequences as promoted by energy scenarios in a highly uncertain economic, environmental, political and social target system. Conclusions If societal decisions are vested in computed model results, it is meaningful to accompany these with an uncertainty assessment. Bayesian Model Averaging (BMA) for input variables of energy models could add to the currently limited tools for uncertainty assessment of model-based energy scenarios

KITopen

arXiv.org e-Print Archive

ART: A machine learning Automated Recommendation Tool for synthetic biology

Author: A Espah Borujeni
A Esteva
AJ Jervis
AL Meadows
B Alipanahi
CE Hodgman
CG Begley
CJ Paddon
CJ Petzold
CM Denby
D Wolpert
DE Cameron
E Begoli
EC Hayden
F Pedregosa
F Prinz
G Renouard-Vallet
G Stephanopoulos
HM Salis
HR Beller
I Shaked
J Alonso-Gutierrez
J Alonso-Gutierrez
J Heinemann
J Nielsen
JA Doudna
JA Hoeting
JD Keasling
JM Granda
JV Kurian
K Kyrou
K Le
K Magnuson
L Breiman
M Baker
M HamediRad
M Kosinski
MD McKay
MM Noack
MT Bonde
NI Tracy
P Carbonell
P Opgenorth
PC Gach
PK Ajikumar
S Ma
S Unthan
S Van Dien
T Fuhrer
TS Batth
TS Gardner
V Chubukov
VG Yadav
W Duetz
WC Morrell
Y Chen
Y Yao
Z Costello
ZD Stephens
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Biology has changed radically in the last two decades, transitioning from a descriptive science into a design science. Synthetic biology allows us to bioengineer cells to synthesize novel valuable molecules such as renewable biofuels or anticancer drugs. However, traditional synthetic biology approaches involve ad-hoc engineering practices, which lead to long development times. Here, we present the Automated Recommendation Tool (ART), a tool that leverages machine learning and probabilistic modeling techniques to guide synthetic biology in a systematic fashion, without the need for a full mechanistic understanding of the biological system. Using sampling-based optimization, ART provides a set of recommended strains to be built in the next engineering cycle, alongside probabilistic predictions of their production levels. We demonstrate the capabilities of ART on simulated data sets, as well as experimental data from real metabolic engineering projects producing renewable biofuels, hoppy flavored beer without hops, and fatty acids. Finally, we discuss the limitations of this approach, and the practical consequences of the underlying assumptions failing